Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast-import: disallow "." and ".." path components #1831

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

newren
Copy link

@newren newren commented Nov 24, 2024

Changes since v1:

  • make use of is_dot_or_dotdot() from dir.h
  • fix style issue

cc: Eric Sunshine [email protected]
cc: Patrick Steinhardt [email protected]
cc: "Kristoffer Haugsbakk" [email protected]
cc: Jeff King [email protected]

@newren
Copy link
Author

newren commented Nov 25, 2024

/submit

Copy link

gitgitgadget bot commented Nov 25, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1831/newren/disallow-dotdot-fast-import-v1

To fetch this version to local tag pr-1831/newren/disallow-dotdot-fast-import-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1831/newren/disallow-dotdot-fast-import-v1

Copy link

gitgitgadget bot commented Nov 25, 2024

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Mon, Nov 25, 2024 at 12:58 PM Elijah Newren via GitGitGadget
<[email protected]> wrote:
> If a user specified e.g.
>    M 100644 :1 ../some-file
> then fast-import previously would happily create a git history where
> there is a tree in the top-level directory named "..", and with a file
> inside that directory named "some-file".  The top-level ".." directory
> causes problems.  While git checkout will die with errors and fsck will
> report hasDotdot problems, the user is going to have problems trying to
> remove the problematic file.  Simply avoid creating this bad history in
> the first place.
>
> Signed-off-by: Elijah Newren <[email protected]>
> ---
> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> @@ -1466,6 +1466,9 @@ static int tree_content_set(
>         e->name = to_atom(p, n);
> +       if (!strcmp(e->name->str_dat, ".") || !strcmp(e->name->str_dat, "..")) {
> +               die("path %s contains invalid component", p);
> +       }

Probably not worth a reroll, but is_dot_or_dotdot() might be usable here.

(And -- style nit -- the braces could be dropped.)

Copy link

gitgitgadget bot commented Nov 25, 2024

User Eric Sunshine <[email protected]> has been added to the cc: list.

If a user specified e.g.
   M 100644 :1 ../some-file
then fast-import previously would happily create a git history where
there is a tree in the top-level directory named "..", and with a file
inside that directory named "some-file".  The top-level ".." directory
causes problems.  While git checkout will die with errors and fsck will
report hasDotdot problems, the user is going to have problems trying to
remove the problematic file.  Simply avoid creating this bad history in
the first place.

Signed-off-by: Elijah Newren <[email protected]>
Copy link

gitgitgadget bot commented Nov 25, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

On Mon, Nov 25, 2024 at 10:15 AM Eric Sunshine <[email protected]> wrote:
>
> On Mon, Nov 25, 2024 at 12:58 PM Elijah Newren via GitGitGadget
> <[email protected]> wrote:
> > If a user specified e.g.
> >    M 100644 :1 ../some-file
> > then fast-import previously would happily create a git history where
> > there is a tree in the top-level directory named "..", and with a file
> > inside that directory named "some-file".  The top-level ".." directory
> > causes problems.  While git checkout will die with errors and fsck will
> > report hasDotdot problems, the user is going to have problems trying to
> > remove the problematic file.  Simply avoid creating this bad history in
> > the first place.
> >
> > Signed-off-by: Elijah Newren <[email protected]>
> > ---
> > diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> > @@ -1466,6 +1466,9 @@ static int tree_content_set(
> >         e->name = to_atom(p, n);
> > +       if (!strcmp(e->name->str_dat, ".") || !strcmp(e->name->str_dat, "..")) {
> > +               die("path %s contains invalid component", p);
> > +       }
>
> Probably not worth a reroll, but is_dot_or_dotdot() might be usable here.
>
> (And -- style nit -- the braces could be dropped.)

Good catches, thanks.  I think they are worth a reroll; I'll send one in.

Copy link

gitgitgadget bot commented Nov 25, 2024

User Elijah Newren <[email protected]> has been added to the cc: list.

@newren
Copy link
Author

newren commented Nov 25, 2024

/submit

Copy link

gitgitgadget bot commented Nov 25, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1831/newren/disallow-dotdot-fast-import-v2

To fetch this version to local tag pr-1831/newren/disallow-dotdot-fast-import-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1831/newren/disallow-dotdot-fast-import-v2

Copy link

gitgitgadget bot commented Nov 26, 2024

This patch series was integrated into seen via git@0a88e9e.

@gitgitgadget gitgitgadget bot added the seen label Nov 26, 2024
Copy link

gitgitgadget bot commented Nov 26, 2024

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Mon, Nov 25, 2024 at 07:00:48PM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <[email protected]>
> 
> If a user specified e.g.
>    M 100644 :1 ../some-file
> then fast-import previously would happily create a git history where
> there is a tree in the top-level directory named "..", and with a file
> inside that directory named "some-file".  The top-level ".." directory
> causes problems.  While git checkout will die with errors and fsck will
> report hasDotdot problems, the user is going to have problems trying to
> remove the problematic file.  Simply avoid creating this bad history in
> the first place.

Makes sense.

More generally this made me wonder whether we should maybe extract some
bits out of "fsck.c" so that we don't have to duplicate the checks done
there in git-fast-import(1). This would for example include checks for
".git" and its HFS/NTFS variants as well as tree entry length checks for
names longer than 4096 characters.

This of course does not have to be part of your patch, which looks good
to me.

Thanks!

Patrick

Copy link

gitgitgadget bot commented Nov 26, 2024

User Patrick Steinhardt <[email protected]> has been added to the cc: list.

Copy link

gitgitgadget bot commented Nov 26, 2024

This patch series was integrated into seen via git@7ccbb69.

Copy link

gitgitgadget bot commented Nov 26, 2024

This patch series was integrated into next via git@8b145bb.

@gitgitgadget gitgitgadget bot added the next label Nov 26, 2024
Copy link

gitgitgadget bot commented Nov 27, 2024

On the Git mailing list, "Kristoffer Haugsbakk" wrote (reply to this):

Hi.  I see that this is in `next` now so the following might
be irrelevant.

On Mon, Nov 25, 2024, at 20:00, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <[email protected]>
> [...]
> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> index 76d5c20f141..995ef76f9d6 100644
> --- a/builtin/fast-import.c
> +++ b/builtin/fast-import.c
> @@ -1466,6 +1466,8 @@ static int tree_content_set(
>  		root->tree = t = grow_tree_content(t, t->entry_count);
>  	e = new_tree_entry();
>  	e->name = to_atom(p, n);
> +	if (is_dot_or_dotdot(e->name->str_dat))
> +		die("path %s contains invalid component", p);

Nit: single-quoting the path seems more common:

    $ git grep "\"path '%s'" ':!po/' | wc -l
    17
    $ git grep "\"path %s" ':!po/' | wc -l
    4

>  	e->versions[0].mode = 0;
>  	oidclr(&e->versions[0].oid, the_repository->hash_algo);
>  	t->entries[t->entry_count++] = e;
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 6224f54d4d2..caf3dc003a0 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -522,6 +522,26 @@ test_expect_success 'B: fail on invalid committer (5)' '
>  	test_must_fail git fast-import <input
>  '
>
> +test_expect_success 'B: fail on invalid file path' '
> +	cat >input <<-INPUT_END &&
> +	blob
> +	mark :1
> +	data <<EOF
> +	File contents
> +	EOF
> +
> +	commit refs/heads/badpath
> +	committer Name <email> $GIT_COMMITTER_DATE
> +	data <<COMMIT
> +	Commit Message
> +	COMMIT
> +	M 100644 :1 ../invalid-path

Maybe the test could be parameterized so that both `..` and `.` can
be tested?  Like in `test_path_eol_success`.

-- 
Kristoffer Haugsbakk

Copy link

gitgitgadget bot commented Nov 27, 2024

User "Kristoffer Haugsbakk" <[email protected]> has been added to the cc: list.

Copy link

gitgitgadget bot commented Nov 27, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Kristoffer Haugsbakk" <[email protected]> writes:

>> +	if (is_dot_or_dotdot(e->name->str_dat))
>> +		die("path %s contains invalid component", p);
>
> Nit: single-quoting the path seems more common:
>
>     $ git grep "\"path '%s'" ':!po/' | wc -l
>     17
>     $ git grep "\"path %s" ':!po/' | wc -l
>     4

Ah, I missed that one.  Thanks for catching.

We probably should write it down.

--- >8 ---
[PATCH] CodingGuidelines: a handful of error message guidelines

It is more efficient to have something in the coding guidelines
document to point at, when we want to review and comment on a new
message in the codebase to make sure it "fits" in the set of
existing messages.

Let's write down established best practice we are aware of.

Signed-off-by: Junio C Hamano <[email protected]>
---

 * I am writing what I think is the established practice from
   memory; clarifications, corrections, and additions are all
   welcome.

 Documentation/CodingGuidelines | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
index 87904791cb..0444391983 100644
--- c/Documentation/CodingGuidelines
+++ w/Documentation/CodingGuidelines
@@ -703,16 +703,22 @@ Program Output
 
 Error Messages
 
- - Do not end error messages with a full stop.
+ - Do not end a single-sentence error message with a full stop.
 
  - Do not capitalize the first word, only because it is the first word
-   in the message ("unable to open %s", not "Unable to open %s").  But
+   in the message ("unable to open '%s'", not "Unable to open '%s'").  But
    "SHA-3 not supported" is fine, because the reason the first word is
    capitalized is not because it is at the beginning of the sentence,
    but because the word would be spelled in capital letters even when
    it appeared in the middle of the sentence.
 
- - Say what the error is first ("cannot open %s", not "%s: cannot open")
+ - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
+
+ - Enclose the subject of an error inside a pair of single quotes,
+   e.g. `die(_("unable to open '%s'"), path)`.
+
+ - Unless there is a compelling reason not to, error messages should
+   be marked for `_("translation")`.
 
 
 Externally Visible Names

Copy link

gitgitgadget bot commented Nov 27, 2024

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Nov 26, 2024 at 07:57:57AM +0100, Patrick Steinhardt wrote:

> On Mon, Nov 25, 2024 at 07:00:48PM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <[email protected]>
> > 
> > If a user specified e.g.
> >    M 100644 :1 ../some-file
> > then fast-import previously would happily create a git history where
> > there is a tree in the top-level directory named "..", and with a file
> > inside that directory named "some-file".  The top-level ".." directory
> > causes problems.  While git checkout will die with errors and fsck will
> > report hasDotdot problems, the user is going to have problems trying to
> > remove the problematic file.  Simply avoid creating this bad history in
> > the first place.
> 
> Makes sense.
> 
> More generally this made me wonder whether we should maybe extract some
> bits out of "fsck.c" so that we don't have to duplicate the checks done
> there in git-fast-import(1). This would for example include checks for
> ".git" and its HFS/NTFS variants as well as tree entry length checks for
> names longer than 4096 characters.

I had the same thought, but I think the right code to be using is
verify_path(). That's what ultimately is used to let names into the
index from trees, from update-index, or from other tools like git-apply.

So I'd consider that authoritative, and fsck is mostly trying to follow
those rules while looking at only a single tree at a time. But
fast-import should have the whole path as a string, just like the index
code does).

-Peff

Copy link

gitgitgadget bot commented Nov 27, 2024

User Jeff King <[email protected]> has been added to the cc: list.

Copy link

gitgitgadget bot commented Nov 27, 2024

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Wed, Nov 27, 2024 at 8:23 AM Junio C Hamano <[email protected]> wrote:
> [PATCH] CodingGuidelines: a handful of error message guidelines
>
> It is more efficient to have something in the coding guidelines
> document to point at, when we want to review and comment on a new
> message in the codebase to make sure it "fits" in the set of
> existing messages.
>
> Let's write down established best practice we are aware of.
>
> Signed-off-by: Junio C Hamano <[email protected]>
> ---
> diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
> @@ -703,16 +703,22 @@ Program Output
>  Error Messages
>
> - - Do not end error messages with a full stop.
> + - Do not end a single-sentence error message with a full stop.
>
>   - Do not capitalize the first word, only because it is the first word
> -   in the message ("unable to open %s", not "Unable to open %s").  But
> +   in the message ("unable to open '%s'", not "Unable to open '%s'").  But
>     "SHA-3 not supported" is fine, because the reason the first word is
>     capitalized is not because it is at the beginning of the sentence,
>     but because the word would be spelled in capital letters even when
>     it appeared in the middle of the sentence.
>
> - - Say what the error is first ("cannot open %s", not "%s: cannot open")
> + - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
> +
> + - Enclose the subject of an error inside a pair of single quotes,
> +   e.g. `die(_("unable to open '%s'"), path)`.

These changes all seem fine.

> + - Unless there is a compelling reason not to, error messages should
> +   be marked for `_("translation")`.

We might want to spell this out more fully, such as stating that
messages from porcelain commands should be marked for translation, but
messages in plumbing should not. Also, perhaps mention explicitly that
BUG("message") should not be marked for translation since they are
intended to be read by Git developers, not by end-users.

Copy link

gitgitgadget bot commented Nov 27, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <[email protected]> writes:

> I had the same thought, but I think the right code to be using is
> verify_path(). That's what ultimately is used to let names into the
> index from trees, from update-index, or from other tools like git-apply.

Yeah, I agree that is the right helper to use.

Copy link

gitgitgadget bot commented Nov 28, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Taking input from comments by Eric (thanks) on the previous round,
this iteration adds a bit more about Porcelain/Plumbing and BUG().

  diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
  index 71e4742fd5..2b8f99f333 100644
  --- a/Documentation/CodingGuidelines
  +++ b/Documentation/CodingGuidelines
  @@ -703,8 +703,15 @@ Error Messages
    - Enclose the subject of an error inside a pair of single quotes,
      e.g. `die(_("unable to open '%s'"), path)`.
   
  - - Unless there is a compelling reason not to, error messages should
  -   be marked for `_("translation")`.
  + - Unless there is a compelling reason not to, error messages from the
  +   Porcelain command should be marked for `_("translation")`.
  +
  + - Error messages from the plumbing commands are sometimes meant for
  +   machine consumption and should not be marked for `_("translation")`
  +   to keep them 'grep'-able.
  +
  + - BUG("message") are for communicating the specific error to
  +   developers, and not to be translated.
   
   
   Externally Visible Names

--- >8 ---
It is more efficient to have something in the coding guidelines
document to point at, when we want to review and comment on a new
message in the codebase to make sure it "fits" in the set of
existing messages.

Let's write down established best practice we are aware of.

Helped-by: Eric Sunshine <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
---
 Documentation/CodingGuidelines | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 3263245b03..2b8f99f333 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -689,16 +689,29 @@ Program Output
 
 Error Messages
 
- - Do not end error messages with a full stop.
+ - Do not end a single-sentence error message with a full stop.
 
  - Do not capitalize the first word, only because it is the first word
-   in the message ("unable to open %s", not "Unable to open %s").  But
+   in the message ("unable to open '%s'", not "Unable to open '%s'").  But
    "SHA-3 not supported" is fine, because the reason the first word is
    capitalized is not because it is at the beginning of the sentence,
    but because the word would be spelled in capital letters even when
    it appeared in the middle of the sentence.
 
- - Say what the error is first ("cannot open %s", not "%s: cannot open")
+ - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
+
+ - Enclose the subject of an error inside a pair of single quotes,
+   e.g. `die(_("unable to open '%s'"), path)`.
+
+ - Unless there is a compelling reason not to, error messages from the
+   Porcelain command should be marked for `_("translation")`.
+
+ - Error messages from the plumbing commands are sometimes meant for
+   machine consumption and should not be marked for `_("translation")`
+   to keep them 'grep'-able.
+
+ - BUG("message") are for communicating the specific error to
+   developers, and not to be translated.
 
 
 Externally Visible Names

-- 
2.47.1-499-g8536fed62d

Copy link

gitgitgadget bot commented Nov 28, 2024

This branch is now known as en/fast-import-path-sanitize.

Copy link

gitgitgadget bot commented Nov 28, 2024

This patch series was integrated into seen via git@66d1ef3.

Copy link

gitgitgadget bot commented Nov 28, 2024

This patch series was integrated into seen via git@abced81.

Copy link

gitgitgadget bot commented Nov 28, 2024

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Wed, Nov 27, 2024 at 7:36 PM Junio C Hamano <[email protected]> wrote:
> It is more efficient to have something in the coding guidelines
> document to point at, when we want to review and comment on a new
> message in the codebase to make sure it "fits" in the set of
> existing messages.
>
> Let's write down established best practice we are aware of.
>
> Helped-by: Eric Sunshine <[email protected]>
> Signed-off-by: Junio C Hamano <[email protected]>
> ---
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> @@ -689,16 +689,29 @@ Program Output
>  Error Messages
>
> - - Say what the error is first ("cannot open %s", not "%s: cannot open")
> + - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
> +
> + - Enclose the subject of an error inside a pair of single quotes,
> +   e.g. `die(_("unable to open '%s'"), path)`.
> +
> + - Unless there is a compelling reason not to, error messages from the
> +   Porcelain command should be marked for `_("translation")`.

Here you capitalize "Porcelain" but below, "plumbing" is all lowercase.

> + - Error messages from the plumbing commands are sometimes meant for
> +   machine consumption and should not be marked for `_("translation")`
> +   to keep them 'grep'-able.

Using the same example, `_("translation")`, for both the "should be"
and "should not be" cases may very well confuse readers. (It certainly
confused me.) Perhaps mirroring the example of an item earlier in the
list would be clearer:

    - Unless there is a compelling reason not to, error messages from
      porcelain commands should be marked for translation, e.g.
      `die(_("bad revision"))`

    - Error messages from plumbing commands are sometimes meant for
      machine consumption, thus should not be marked for translation,
      e.g. `die("bad revision")`

> + - BUG("message") are for communicating the specific error to
> +   developers, and not to be translated.

Okay, although could be slightly more explicit:

    - BUG("message") is for communicating a specific failure to
      developers, not end-users, thus should not be translated.

Copy link

gitgitgadget bot commented Nov 28, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Eric Sunshine <[email protected]> writes:

>> + - Unless there is a compelling reason not to, error messages from the
>> +   Porcelain command should be marked for `_("translation")`.
>
> Here you capitalize "Porcelain" but below, "plumbing" is all lowercase.

;-) I think that is how we spell them in our documentation when we
contrast them against each other.

>> + - Error messages from the plumbing commands are sometimes meant for
>> +   machine consumption and should not be marked for `_("translation")`
>> +   to keep them 'grep'-able.
>
> Using the same example, `_("translation")`, for both the "should be"
> and "should not be" cases may very well confuse readers. (It certainly
> confused me.) Perhaps mirroring the example of an item earlier in the
> list would be clearer:
>
>     - Unless there is a compelling reason not to, error messages from
>       porcelain commands should be marked for translation, e.g.
>       `die(_("bad revision"))`
>
>     - Error messages from plumbing commands are sometimes meant for
>       machine consumption, thus should not be marked for translation,
>       e.g. `die("bad revision")`

Thanks, that is much better.  Let me steal it verbatim in the
hopefully final reroll.

>> + - BUG("message") are for communicating the specific error to
>> +   developers, and not to be translated.
>
> Okay, although could be slightly more explicit:
>
>     - BUG("message") is for communicating a specific failure to
>       developers, not end-users, thus should not be translated.

The way I read your rewrite is that the "communitation" mentioned is
between the program and the user who saw the message.  I wanted to
say that the message is seen first by an end-user, and then is
communicated to developers.  And not translating is one way to make
sure the message is not mangled, and stays grep-able, during the
game of telephone.

Would this work better?

  - In order to help the user who saw BUG("message") to accurately
    communicate it to developers, do not mark them for translation.

Thanks.

Copy link

gitgitgadget bot commented Nov 28, 2024

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Thu, Nov 28, 2024 at 4:28 AM Junio C Hamano <[email protected]> wrote:
> Eric Sunshine <[email protected]> writes:
> >> +   Porcelain command should be marked for `_("translation")`.
> >
> > Here you capitalize "Porcelain" but below, "plumbing" is all lowercase.
>
> ;-) I think that is how we spell them in our documentation when we
> contrast them against each other.

I must not have been paying close enough attention.

> >> + - BUG("message") are for communicating the specific error to
> >> +   developers, and not to be translated.
> >
> > Okay, although could be slightly more explicit:
> >
> >     - BUG("message") is for communicating a specific failure to
> >       developers, not end-users, thus should not be translated.
>
> The way I read your rewrite is that the "communitation" mentioned is
> between the program and the user who saw the message.  I wanted to
> say that the message is seen first by an end-user, and then is
> communicated to developers.  And not translating is one way to make
> sure the message is not mangled, and stays grep-able, during the
> game of telephone.
>
> Would this work better?
>
>   - In order to help the user who saw BUG("message") to accurately
>     communicate it to developers, do not mark them for translation.

Let's not spend too much time fine-tuning this. I found your original
clearer than this rewrite. It was just the "and not to be" bit that
made my reading hiccup. Taking your original but substituting in
"thus" may help:

    - BUG("message") are for communicating the specific error to
      developers, thus should not be translated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant